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ABSTRACT 



A voice activated camera is described which allows users to 
take remote photographs by speaking one or more keywords. 
In a preferred embodiment, a speech processing unit is pro- 
vided which is arranged to detect extended periodic signals 
from a microphone of the camera. A control unit is also 
provided to control the taking of a photograph when such an 
extended periodic component is detected by the speech pro- 
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shown in FIG. 2, the camera 3 also includes a micraphone 39 periodic sound is present within the df 

for converting a user's speech into corresponding electrical sented by the forty-four frames, 

speech signals; and a speech processing unit 41 which pro- When the extended periodicity detenuining tinit 111 

cesses the electrical speech signals to detect the presence of a detects a sustained periodic sound within the speech signal, it 

keyword in the user's speech and which informs the camera 5 passes a signal to the camera control unit 33 confirming the 

control unit 33 accordingly. detection. As discussed above, the camera control unit 33 then 
controls the operation of the camera 3 to take the photograph 

Speech Processsing Unit at the appropriate time. 

As discussed above, the speech processing unit 41 is p^^^ Periodicity Determining Unit 
arranged to detect keywords spoken by the user in order to ^° as those skilled in the art will appreciate, various tech- 
control the taking of remote photographs. In this embodi- piques can be used to determine a measure of the periodicity 
ment, the speech processing unit does not employ a "conven- of the speech within each speech frame. However, the main 
tional" automatic speech recognition type keyword spotter components of the particular firame periodicity determining 
which compares the spoken speech with stored models to unit 107 used in this embodiment is shown in FIG. S. As 
identify the presence of one of the keywords. Instead, the shown, the frame periodicity determining unit 107 includes 
speech processing unit 41 used in this embodiment is an auto-correlation deterrniningvmit 1071 which receives the 
arranged to detect a sustained periodic signal within the input current speech fi°ame f, from the framing unit 105 and which 
speech, such as would occur if the use r gays the w ord determines the auto-correlation ofthe speech samples within 
"cheeeesc" or some other similar word. jThc inventor j has the frame. In particular, the auto-correlation determining unit 
found that because of the strong periodic natare of such a 1071 calculates the following frinction: 
sustained vowel sound, the speech processing unit 41 can still 

detect the sound even at very low signal-to-noise ratios. j a-L-t (1) 

Thewayinwhichthespeechprocessingimit41operatesin ^^^^'J/ZZ^ 'UW+iJ 

this embodiment will now be explained with reference to js 
FIGS. 3 to 7. 

FIG. 3 illustrates the main fiinctional blocks ofthe speech "^^^^ "0) J* sample within the current frame, N is the 
processing unit 41 used in this embodiment. The input signal number of samples in the frame, j=0 to N- 1 and L=0 to N- 1 . 
(S(t)) received from the microphone 39 is sampled (at a rate of The value of A(L) for L=0 is equal to the signal energy and 
just over 11 KHz) and digitised by an analogue-to-digital 30 for L>0 it corresponds to shifting the signal by L samples and 
(A/D) converter 101. Although not shown, the speech pro- conelating it with the original signal. A periodic signal shows 
cessing unit 41 will also include an anti-aliasing filter before strong peaks in the auto-cortelation ftinction for values of L 
the A/D converter 101, to prevent aliasing effects occurring that are multiples of the pitch period. In contrast, non-peri- 
due to the sampling. The sampled signal is then filtered by a odic signals do not have strong peaks. 
bandpassfilterl03whichremovesunwantedfrequencycom- ^IG. 6 shows the auto-correlation ftinction (A,(L)) for a 
ponents. Smce voiced sounds (as opposed to fricative sounds) frame of speech f, representing a speech signal which is peri- 
are generated by the vibration o the user s vocal cords, the ^^ic and which repeats approximately every 90 samples. As 
smallest fundamcn^l frequency ai,tch)of^^^^^ shown in FIG. 6, the auto-irrelation around L=l 80. Further, 

remove frequency components below 100 Hertz which will "0 mately the same as the value at L=0, indicating that the signal 

not contribute to the desired periodic signal. Also, the band- " strongly penodic. 

pass filter 1 03 is arranged to remove frequencies above 500 The fimdamental frequency or pitch of voiced speech sig- 
Hertz which reduces broadband noise from the signal and nals varies between 100 and 300 Hertz. Therefore, a peak in 
therefore improves the signal-to-noise ratio. The input speech the auto-correlation function is expected between hi^=VJ 
is then divided into non-overiapping equal length frames of 45 300 and L^,^;,=F/100, where F^ is the sampling frequency of 
speech samples by a framing unit 105. In particular, in this the input speech signal. Consequently, in this embodiment, 
embodiment the framing unit 105 extracts a frame of speech the auto-correlation function output by the auto-correlation 
samplesevety23 milliseconds. Withthesamplingrateusedin determining unit 1071 is input to a peak determining unit 
thisembodiment,thisresultsineachframehaving256speech 1073 which processes the auto-correlation values between 
samples. FIG. 4 illustrates the sampled speech signal (S(n), 50 A(Lio^) and A(L^^oJ to identify the peak value (A(L^^)) 
shown as a continuous signal for ease of illustration) and the within this range. In this embodiment, with a samplingrate of 
way that the speech signal is divided into non-overlapping j^sj over 11 kHz the value of L^„^ is 37 and the value of 
L is 1 1 1 . This search range of the peak determining unit 
AsshowninFIG.3,eachframefiOfspeechsamplesisthen 1073 is illustrated in FIG. 6 by the vertical dashed lines, 
processed by a frame periodicity determining unit 107 which 55 ^hich also shows the peak occurring at Lj,^^0. The auto- 
processes the speech samples within the faime to calculate a correlation values A(0) and A(L;^^) are thenpassed from the 
measure (v.) of the degree of penodic.ty of the speech withm ^ determining unit 1073 to a periodicity measuring unit 
the frame. A hi^ degree of penodicify within a frame is 1075 which is arranged to generate a nonnalised frame peri- 
mdicatiye of a voiced sound when the vocal cords are vibrat- j j. ^ ^ , . calculating: 
mg. A low degree of penodicify is mdicative of noise or ' wuiis^ui xiouic ^i,; u, 
fricative sounds. The calculated periodicity measure (v,) is 
then stored in a first-in-first-out buffer 109. In this embodi- 
ment, the buffer 109 can store frame periodicity measures for v,- = 
forty-four consecutive frames, corresponding to just over one ^'^^ 
second of speech. Each time a new frame periodicity measure 
is added to the buffer 109, an extended periodicity determin- 65 
ing unit 111 processes all of the forty-four periodicity mea- 
sures inthebufferl09todeterminewhetherornotasustained close to zero for a non-periodic signal. 



